Search CORE

21 research outputs found

Separation logic for high-level synthesis

Author: Winterstein Felix
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/05/2016
Field of study

High-level synthesis (HLS) promises a significant shortening of the digital hardware design cycle by raising the abstraction level of the design entry to high-level languages such as C/C++. However, applications using dynamic, pointer-based data structures remain difficult to implement well, yet such constructs are widely used in software. Automated optimisations that leverage the memory bandwidth of dedicated hardware implementations by distributing the application data over separate on-chip memories and parallelise the implementation are often ineffective in the presence of dynamic data structures, due to the lack of an automated analysis that disambiguates pointer-based memory accesses. This thesis takes a step towards closing this gap. We explore recent advances in separation logic, a rigorous mathematical framework that enables formal reasoning about the memory access of heap-manipulating programs. We develop a static analysis that automatically splits heap-allocated data structures into provably disjoint regions. Our algorithm focuses on dynamic data structures accessed in loops and is accompanied by automated source-to-source transformations which enable loop parallelisation and physical memory partitioning by off-the-shelf HLS tools. We then extend the scope of our technique to pointer-based memory-intensive implementations that require access to an off-chip memory. The extended HLS design aid generates parallel on-chip multi-cache architectures. It uses the disjointness property of memory accesses to support non-overlapping memory regions by private caches. It also identifies regions which are shared after parallelisation and which are supported by parallel caches with a coherency mechanism and synchronisation, resulting in automatically specialised memory systems. We show up to 15x acceleration from heap partitioning, parallelisation and the insertion of the custom cache system in demonstrably practical applications.Open Acces

Spiral - Imperial College Digital Repository

Tigris: Architecture and Algorithms for 3D Perception in Point Clouds

Author: Aly Mohamed
Boulch Alexandre
Choi Sungjoon
Feng Yu
Golub Gene H
Lowe David G
Ma Vincent CH
Moré Jorge J
Ooi Beng C
Popov Stefan
Rocchini CMPPC
Whitty Mark
Winterstein Felix
Xiao Chunxia
Xu Kun
Zhang Ji
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/11/2019
Field of study

Machine perception applications are increasingly moving toward manipulating and processing 3D point cloud. This paper focuses on point cloud registration, a key primitive of 3D data processing widely used in high-level tasks such as odometry, simultaneous localization and mapping, and 3D reconstruction. As these applications are routinely deployed in energy-constrained environments, real-time and energy-efficient point cloud registration is critical. We present Tigris, an algorithm-architecture co-designed system specialized for point cloud registration. Through an extensive exploration of the registration pipeline design space, we find that, while different design points make vastly different trade-offs between accuracy and performance, KD-tree search is a common performance bottleneck, and thus is an ideal candidate for architectural specialization. While KD-tree search is inherently sequential, we propose an acceleration-amenable data structure and search algorithm that exposes different forms of parallelism of KD-tree search in the context of point cloud registration. The co-designed accelerator systematically exploits the parallelism while incorporating a set of architectural techniques that further improve the accelerator efficiency. Overall, Tigris achieves 77.2

\times

speedup and 7.4

\times

power reduction in KD-tree search over an RTX 2080 Ti GPU, which translates to a 41.7% registration performance improvements and 3.0

\times

power reduction.Comment: Published at MICRO-52 (52nd IEEE/ACM International Symposium on Microarchitecture); Tiancheng Xu and Boyuan Tian are co-primary author

arXiv.org e-Print Archive

Crossref

Separation Logic for High-Level Synthesis

Author: Cook Byron
Felix J. Winterstein
George A. Constantinides
Magill Stephen
Samuel R. Bayliss
Winterstein Felix
Winterstein Felix
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Building Professionally-Based Communities of Learning among Faculty, Students, and Practioners

Author: Campa Henry, III
Felix Alexandra B
Taylor William W
Winterstein Scott R
Publication venue: DigitalCommons@USU
Publication date: 01/01/2004
Field of study

Residential and non-residential “communities of learning” have been used within institutions of higher education as formal methods to enhance interactions among individuals that ultimately helps learning. Typically, these communities have included student-to-student and faculty-to-student interactions within residential living areas, teams in a core of courses, or teams of students within a course. If students are to develop into leaders within their respective disciplines an additional component that should be integrated into communities of learning is practioners. The objectives of our paper are to describe: 1) communities of learning and why they should be established for all students to enhance learning, 2) how to integrate a community of learning into its respective community of practice, 3) models of communities of learning and their characteristics, and 4) what roles natural resource practitioners, faculty, and students can play in developing and maintaining non-residential communities of learning to meet academic and professional objectives. Ultimately, the integration of faculty, students, and practioners for developing and maintaining learning communities will help create an educational culture that produces life-long learners and leaders in natural resource

DigitalCommons@USU

Separation Logic-Assisted Code Transformations for Efficient High-Level Synthesis

Author: Felix Winterstein
George A. Constantinides
Samuel Bayliss
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Abstract—The capabilities of modern FPGAs permit the mapping of increasingly complex applications into reconfigurable hardware. High-level synthesis (HLS) promises a significant shortening of the FPGA design cycle by raising the abstraction level of the design entry to high-level languages such as C/C++. Applications using dynamic, pointer-based data structures and dynamic memory allocation, however, remain difficult to im-plement well, yet such constructs are widely used in software. Automated optimizations that aim to leverage the increased memory bandwidth of FPGAs by distributing the application data over separate banks of on-chip memory are often ineffective in the presence of dynamic data structures, due to the lack of an automated analysis of pointer-based memory accesses. In this work, we take a step towards closing this gap. We present a static analysis for pointer-manipulating programs which automatically splits heap-allocated data structures into disjoint, independent regions. The analysis leverages recent advances in separation logic, a theoretical framework for reasoning about heap-allocated data which has been successfully applied in recent software verification tools. Our algorithm focuses on dynamic data structures accessed in loops and is accompanied by automated source-to-source transformations which enable automatic loop parallelization and memory partitioning by off-the-shelf HLS tools. We demonstrate the successful loop parallelization and memory partitioning by our tool flow using three real-life applications which build, tra-verse, update and dispose dynamically allocated data structures. Our case studies, comparing the automatically parallelized to the non-parallelized HLS implementations, show an average latency reduction by a factor of 2.5 across our benchmarks. Keywords—FPGA; high-level synthesis; memory system; dy-namic data structures; separation logic; static analysis; I

CiteSeerX

Crossref

FPGA-BASED K-MEANS CLUSTERING USING TREE-BASED DATA STRUCTURES

Author: Felix Winterstein
George A. Constantinides
Samuel Bayliss
Publication venue
Publication date: 10/12/2013
Field of study

K-means clustering is a popular technique for partitioning a data set into subsets of similar features. Due to their simple control flow and inherent fine-grain parallelism, K-means algorithms are well suited for hardware implementations, such as on field programmable gate arrays (FPGAs), to accelerate the computationally intensive calculation. However, the available hardware resources in massively parallel implementations are easily exhausted for large problem sizes. This paper presents an FPGA implementation of an efficient variant of K-means clustering which prunes the search space using a binary kd-tree data structure to reduce the computational burden. Our implementation uses on-chip dynamic memory allocation to ensure efficient use of memory resources. We describe the trade-off between data-level parallelism and search space reduction at the expense of increased control overhead. A data-sensitive analysis shows that our approach requires up to five times fewer computational FPGA resources than a conventional massively parallel implementation for the same throughput constraint. 1

CiteSeerX

Crossref

A blended-learning concept for basic lectures in electrical engineering: A practical report

Author: Greiner Felix
Pullich Leif
Schlaak Helmut F.
Winterstein Thomas
Publication venue
Publication date: 01/07/2012
Field of study

TUbiblio

Brief von Felix Hollaender, Hans Wassmann, Hedwig Wangel, Eduard von Winterstein, Georg Engels und Unbekannt an Gerhart Hauptmann

Author: Engels Georg
Hollaender Felix
Wangel Hedwig
Wassmann Hans
Winterstein Eduard von
Publication venue
Publication date: 01/01/1904
Field of study

Digitalisierte Sammlungen

Custom Multicache Architectures for Heap Manipulating Programs

Author: Felix Winterstein
George A. Constantinides
Hsin-Jung Yang
Kermin E. Fleming
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Integrationsorientierte Verfahren zur Herstellung hybrider Mikrosysteme INSIGHT, FKZ 16SV5053

Author: Greiner Felix
Haus Henry
Quednau Sebastian
Schlaak Helmut F.
Staab Matthias
Winterstein Thomas
Publication venue
Publication date: 14/12/2011
Field of study

TUbiblio